Idea put forward by Ross Mounce:
See if we can extract data from figures (e.g. the coordinates of an x,y plot) and provide that data in a machine-readable form.
see https://github.com/hack4ac/hack4ac.github.io/wiki/Teams-&-Ideas#figure-mining--enrichment
Find code here: https://github.com/waltherg/figure-miner.
This is prototype code that fetches a PLoS image from fighshare and detects contours in the image.
The result at the bottom shows that most elements of the plot are detected -- further processing of the obtained contours should enable us to extract the axes, their ticks and labels, and (hopefully all) data points.
import numpy as np
import cv2
import cv
from matplotlib import pylab as pl
Following this tutorial and this IPython Notebook for basic image downloading and loading.
!wget http://files.figshare.com/1094381/Figure_1.tif
--2013-06-21 17:06:12-- http://files.figshare.com/1094381/Figure_1.tif Resolving files.figshare.com (files.figshare.com)... 178.236.4.60 Connecting to files.figshare.com (files.figshare.com)|178.236.4.60|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 95858 (94K) [image/tiff] Saving to: `Figure_1.tif.3' 100%[======================================>] 95,858 --.-K/s in 0.02s 2013-06-21 17:06:12 (4.01 MB/s) - `Figure_1.tif.3' saved [95858/95858]
image = cv2.imread('Figure_1.tif')
pl.imshow(image)
<matplotlib.image.AxesImage at 0x3f83050>
image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
pl.imshow(image_gray)
<matplotlib.image.AxesImage at 0x427b110>
Tutorials for contour detection:
The example below should be modified to detect disks or ellipses. May find inspiration here:
And links therein in:
contours, hierarchy = cv2.findContours(image_gray,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
# TODO: figure out how to clone images
image = cv2.imread('Figure_1.tif')
for index in range(len(contours)):
cv2.drawContours(image=image, contours=contours, contourIdx=index, color=(255, 0, 0))
pl.imshow(image, interpolation='nearest', aspect='auto')
<matplotlib.image.AxesImage at 0x4a1de50>
To do: